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We consider a single genetic locus which carries two alleles, la- 
^\ ■ belled P and Q. This locus experiences selection and mutation. It is 

linked to a second neutral locus with recombination rate r. If r = 0, 
this reduces to the study of a single selected locus. Assuming a Moran 
model for the population dynamics, we pass to a diffusion approxi- 
P^ ' mation and, assuming that the allele frequencies at the selected locus 

(-H , have reached stationarity, establish the joint generating function for 

the genealogy of a sample from the population and the frequency 

of the P allele. In essence this is the joint generating function for a 

coalescent and the random background in which it evolves. We use 

this to characterize, for the diffusion approximation, the probability 

of identity in state at the neutral locus of a sample of two individuals 

►> ■ (whose type at the selected locus is known) as solutions to a system 

.^4- ' of ordinary differential equations. The only subtlety is to find the 

[-•»« , boundary conditions for this system. Finally, numerical examples are 

presented that illustrate the accuracy and predictions of the diffusion 
\Q ' approximation. In particular, a comparison is made between this ap- 

^^ . proach and one in which the frequencies at the selected locus are 

\l estimated by their value in the absence of fluctuations and a classical 

^-' ' structured coalescent model is used. 



d ■ 1. Introduction. The coalescent process was introduced by Kingman 

(1982) as a simple and elegant description of the genealogical relationships 
amongst a set of neutral genes. Although the two theories have developed 
largely independently, the coalescent is closely related to the classical con- 
cept of identity by descent introduced independently by Cotterman and 



C3 ■ Malecot [see Nagylaki (1989) for a survey]. The original coalescent applies 
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to the case of a single panmictic population of constant size, but it extends 
naturally to describe populations that vary with time or to structured pop- 
ulations in which genes may be found in different places or embedded in 
different genetic backgrounds. By considering the ancestral selection graph, 
various forms of selection can also be incorporated [Krone and Neuhauser 
(1997) and Donnelly and Kurtz (1999)]. However, as the genetic sophisti- 
cation increases, not only are analytic results unattainable but also the ap- 
proach becomes increasingly computationally intensive. Moreover, the pow- 
erful results of Donnelly and Kurtz rest upon exchangeability of the sample. 
This means that they lend themselves to describing the genealogy of a ran- 
dom sample from the population. In the problem that we are concerned with 
here, the sample is not random. 

The particular problem that we are concerned with is the following. Sup- 
pose that selection acts on a single locus which carries two alleles labelled 
P and Q. There is also a strictly positive mutation rate between these two 
alleles so that neither becomes fixed in the population. The selected locus 
is linked to a second neutral locus with recombination rate r. One can then 
ask about the genealogy of a sample from the neutral locus. If we know the 
type of each individual in the sample at the selected site, then we have a 
sample from known locations in a structured population, with two denies 
(determined by the P or Q allele) in which the population size fluctuates 
randomly. Recombination and mutation from P to Q both contribute to 
migration between the denies, while mutation and selection determine the 
population sizes. Setting the recombination rate r = we recover the case 
of a single selected locus. 

For certain forms of selection (directional and balancing) one can address 
this problem using the ancestral selection graphs of Krone and Neuhauser 
(1997). Such graphs trace the lineages of "potential ancestors" of a sam- 
ple from the selected locus. As one traces backwards in time lineages can 
"branch" as well as "coalesce." On reaching the most recent common an- 
cestor of all such potential lineages, one then traces back through the graph 
culling those that in fact did not contribute to the sample. However, this 
method restricts the form of the selection and, as observed in Przeworski, 
Charlesworth and Wall (1999), is also computationally demanding, espe- 
cially if selection is strong, because of the proliferation of potential lineages. 

In fact the most common approach to our problem is to assume that fluc- 
tuations are sufficiently small that we can approximate the allele frequencies 
at the selected locus by their value in the absence of fluctuations and then 
model the genealogy at the neutral locus by the structured coalescent with 
constant deme sizes [Kaplan, Darden and Hudson (1988), Notohara (1990), 
Herbots (1997) and Nordborg (1997)]. One might expect this procedure to 
give good approximations to quantities such as the mean time to the most 
recent common ancestor of the sample, which is very robust, but not to the 
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variance of the same quantity. In fact, in Section 7 we illustrate that for 
some parameter values even the mean time to coalescence for a sample of 
size two can be ill-approximated by this procedure. 

Here we adopt an alternative approach, more akin to the classical one. 
We retain the stochastic fluctuations in the two deme sizes. Although we 
first establish a coalescent-like diffusion approximation for the genealogy of 
the sample, we actually express the quantities that we are interested in as 
solutions to a system of ordinary differential equations and our numerical ex- 
amples will all be obtained by numerically solving this system. In Lemma 3.1 
we identify the appropriate "coalescent." There are no surprises here, the 
model being entirely analogous to a structured coalescent with two demes, 
but now the jump rates in the coalescent are governed by the Wright -Fisher 
diffusion that determines the population sizes in each deme. What is surpris- 
ing is the range of parameter values for which the coalescent approximation 
is valid. One might expect that the mutation rates between alleles P and 
Q need to be large enough that the allele frequencies stay away from the 
margins where the diffusion approximation should break down. As we see 
in Theorem 5.1, in fact we have convergence to the diffusion approximation 
provided that the mutation rates between the selected alleles are nonzero. 

In order to exploit the diffusion approximation, we use it to write down 
a system of ordinary differential equations for the probability of identity in 
state for a sample of size two from the neutral site (Theorem 6.1). Predictions 
of this model are compared not only to those of the Wright-Fisher model 
that it is approximating, but also to those of the constant deme size model 
in Section 7. We see in particular that the fluctuations in allele frequency 
have a significant effect on the probabilities of identity. 

This approach was first suggested by Kaplan, Darden and Hudson (1988, 
1989) who essentially wrote down the same coalescent approximation, al- 
though they did not address mathematical questions of existence of the 
corresponding process or convergence to the limit. They also wrote down 
the system of differential equations that determine the total length of the 
ancestral tree of a sample under this approximation. These are of exactly 
the same form as the equations for probability of identity that we obtain 
here. Because of the singular nature of the coefficients and the difficulty in 
assigning boundary conditions to the system, they develop a novel numerical 
solution scheme. However, this has not been exploited in the literature. Here 
we have been able to identify the boundary conditions for the system and as 
a result our numerical techniques are based on standard software. They are 
described in detail in the companion paper Barton and Etheridge (2004). 

A particular strength of our approach is that it applies to very general 
forms of selection with essentially no additional computational effort. Nu- 
merically it is considerably more efficient than simulations based on the 
ancestral selection graph and moreover, in contrast to such simulations, the 



4 N. H. BARTON, A. M. ETHERIDGE AND A. K. STURM 

computational efficiency does not decrease as the strength of selection in- 
creases. Although we concentrate on the simple setting of a sample of size 
two embedded in a genetic background with just two possible states, the 
approach is easily extended to larger samples and more complex genetic 
backgrounds (albeit at the expense of increased computational complexity); 
see Remark 6.2. As we see in Section 7, even the simplest context provides 
considerable scope for investigation of important biological issues. We do 
not explore it here, but we also see the ordinary differential equations as 
offering a valuable analytic route to a perturbation analysis of identities in 
fluctuating backgrounds. 

Our primary motivation is the desire to investigate the effects of the 
selection on the coalescence times for the neutral site. However, we also re- 
gard this as an important step in understanding more general versions of 
the structured coalescent in randomly fluctuating backgrounds. This is cru- 
cial to understanding populations with complex spatial or genetic structure 
where the number of individuals in each background is not sufficiently large 
that fluctuations can be ignored [see Barton, Depaulis and Etheridge (2002), 
Barton and Navarro (2002) and references therein]. 

The rest of the paper is laid out as follows. In Section 2 we describe our 
model for the (forward in time) evolution of the proportions of the P and 
Q alleles. Our starting point is a version of the Wright-Fisher model. For 
later comparison with the diffusion approximation, we obtain a system of 
algebraic equations for the probabilities of identity in state for a sample 
of size two from such a population. As we explain above Definition 2.1, to 
obtain the diffusion approximation it is convenient to work with the contin- 
uous time counterpart of the Wright-Fisher model, the Moran model. For 
such a model, assuming that the frequencies of the selected alleles, P and 
Q, have reached stationarity, we write down the generator of the process 
{{p (*)) n i W> n 2 (*))}*>o that encodes the backward in time evolution of 
the selected allele frequencies and the numbers of ancestors of the sample 
of neutral alleles alive at time t before the present, labeled according to 
their background (P or Q). In Section 3 we rescale the parameters in our 
model and establish the form of the generator of the corresponding diffusion 
approximation. The existence of a stochastic process with this generator 
and convergence of the rescaled processes to this limit are established in 
Sections 4 and 5, respectively. In Section 6 we write down a system of dif- 
ferential equations for the distribution of coalescence times and hence, for 
the probability of identity in state in a sample of size two. We establish an 
iterative solution to the system and indicate the extension to larger samples 
and more complex genetic backgrounds. In the final section, Section 7, we 
illustrate, in the case of balancing selection, the extremely good agreement 
with the probabilities of identity established via the Wright -Fisher model 
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of Section 2. Although in biological applications one is typically concerned 
with a neutral linked locus, we concentrate on the selected locus (r = 0) for 
easy comparison with alternative approaches. We conclude by comparing 
the predictions of the model to those obtained by assuming that the fre- 
quency of alleles at the selected site is deterministic for different strengths 
of balancing selection. A full discussion of the biological implications can be 
found in the companion paper Barton and Etheridge (2004). 

2. The model. In this section we describe the underlying model. First 
consider the evolution of frequencies at the selected locus. We write p and 
q = 1 — p for the proportions of P and Q alleles respectively. Our starting 
point is a Wright-Fisher model with selection and mutation between types. 
We assume a diploid population of size N. Thus each individual has type PP, 
PQ or QQ and we write P\\, P\ 2 and P22 for the corresponding proportions 
of each type. Then 

p = Pn + \P 12 , 

q = P 22 + \P 12 . 

During the reproductive process, each individual has a large (effectively 
infinite) number of germ cells (cells of the same genotype) that split into 
gametes (cells containing just one chromosome from each pair). The gametes 
then fuse at random to form the next generation. We assume that there is 
selection in favor of certain genotypes. Further there is mutation from type 
P to Q and vice versa. 

Suppose that immediately before the reproductive step, the proportion 
of type P is p. For simplicity we assume multiplicative selection. That is, 
relative fitnesses of PP : PQ : QQ can be expressed in the form u 2 : uv : v 2 . 
This means that we can model selection as acting on haploids, so that after 
selection the proportion of type P will be 

p = 

1 + sp 

for some s. In the case of directional selection, s is just a constant, but by 
taking s to be frequency dependent (i.e., a function of p) we can approximate 
more complicated selection acting on the diploid population. For example, 
balancing selection is modeled by assuming that s = so(Po ~ p) f° r some < 
Po < 1 and constant sq. If the population size is sufficiently large, this is close 
to a model of overdominance with relative diploid fitnesses PP : PQ : QQ of 
1 - s q : 1 : 1 - s po, where q = 1 - p . 

We now account for mutation between P and Q. Suppose that in each 
generation a mutation from P to Q has probability \i\ and from Q to P has 
probability fx 2 . After the mutation step, the proportion of type P is then 

p** = (l-IM l )p*+H20—P*). 
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Finally 2N gametes are chosen at random to form the next generation. 
(These fuse at random into N diploid pairs.) The resulting number of type 
P chromosomes in the population will then be binomially distributed with 
2N trials and success probability p** . 

Now consider the neutral locus. This is on the same chromosome as the 
selected locus, but we allow for the possibility of recombination or crossover 
events. This happens during meiosis (the process of splitting into gametes). 
We assume that with probability r there is a recombination event between 
the selected and neutral sites. The result is that the two gametes exchange 
a portion of chromosome that includes the selected site, but not the neutral 
site. Consequently, if such an event occurs, for each of the two gametes the 
portion of the chromosome that includes the neutral locus and that segment 
including the selected locus come from different parental chromosomes. From 
the point of view of the neutral locus, the two chromosomes swap types at 
the selected locus. By this mechanism (as well as by mutation) an individual 
from the sample at the neutral locus can be in a different genetic background 
from her parent. 

So that we can later numerically test the accuracy of the diffusion approx- 
imation, we now use the Wright-Fisher model to calculate the probabilities 
of identity in state at the neutral locus of a sample of two genomes whose 
type at the selected locus is known. (We shall refer to "individuals" in the 
sample to mean the ancestors of an allele at the neutral locus, as opposed to 
an individual in the diploid population. Since the transitions of the model 
can be interpreted as acting on haploids, this should cause no confusion.) 
The probability of identity will depend on the past history of the population. 
If we knew that history then we could calculate the identities by iterating 
backwards in time. We can still make progress if we assume that the pop- 
ulation is drawn from a stationary distribution. The Wright-Fisher model 
described above is just a finite state space Markov chain. The probability of 
going from i copies of P at time t to j at time t + 1 is given by 

Pn=( 2N ](p**) j (l-P**) 2N - j . 



% = ( 2 f)(/*) i (l 



Provided that it has a nondegenerate stationary distribution {V'i}|=i (which 
is true provided that the mutation rates Hi and \ii are strictly positive), then 
we can reverse the process. The transition probabilities for the backwards 
in time evolution of the number of type P genomes in the population are 
given by the prescription 

r ■ • — — p ■■ 
31 Vj ■ r 

In following the history of the sample at the neutral locus we must decide 
whether each individual was associated with a type P or a type Q at the 
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selected locus in the previous generation. This association can change from 
parent to child as a result of mutation or of recombination. Following through 
the reproductive process above we see that after the selection and mutation, 
the proportion of the P-population that has arisen by mutation is 

^q 

mp ■■ 



(1 - fii)p* + /J, 2 q* 



where we have used the notation q* = 1 — p* . Similarly, the proportion of 
the Q-gametes that have arisen by mutation is 



(1 - fi 2 )q* + IX\V* 

Recall that the probability of a recombination event between the selected 
and neutral sites in one generation is r. We need to know the probability, 
rhp, that a neutral locus currently associated with a type P background 
was associated with a type Q background before the effects of mutation 
and recombination. First observe that if there is a recombination event, the 
chance that it is with an individual that is type Q after the mutation step 
and also with one of type Q before the mutation step is (1 — mg)(l — p**). If 
there is either no recombination event or a recombination with an individual 
whose type after the mutation step is P, then we require that the type P 
arose by mutation. Thus, writing q** = 1 — p* 



,.** 



rhp = r(l — mQ)q** + (1 — rq**)mp. 

Similarly, 

™>Q = rp**(l - mp) + (1 - rp**)rriQ. 

We now have all the information that we require to write down recursions 
for the quantities of interest at the neutral site. In particular, we are in a 
position to write down recursive equations for the probability of identity in 
allelic state for a sample from the neutral locus. (It is this quantity that 
we shall concentrate on in our numerical examples of Section 7.) We assume 
that in each generation mutation to a novel allele at the neutral locus occurs 
with probability v and also that the frequencies at the selected locus have 
reached stationarity. We then write {fpp t i, fpQ,i, fQQ,i} for the probabilities 
of identity in state of a sample of two gametes given that the current number 
of copies of the P allele in the population is i. The subscripts PP, PQ 
and QQ designate the type at the selected locus of the two individuals 
in our sample. Writing {fppi,fpQi-,f*QQi\ for the probability of identity 
after selection, mutation and recombination at the selected site we have, for 
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Kj<2N-l, 

fpp,j = ^2 T Ji(fQQ,i' fh P + 2 fpQ,i' fh p{ 1 ~ mp) + fpp,i(l ~ mp) 2 ), 



fpQJ = £ r ii(/<3Q.*( 1 ~ rhQ)rh P 

i 

(!) + fpQ,i{rhprhQ + {l-rhQ){l-rhp)) 



+ fpp,i"fhQ(l - rhp)) , 

fQQj = £ r ii(/<W>*( 1 ~ ™q) 2 + 2/pQ,im Q (l - mq) + fpp yi m 2 Q ). 
% 

After taking into account random sampling and mutation at the neutral 
locus, the identities become 

fpPj = (l-V) [fppj + -■ — — 



(2) fpQj = 0.-vyfpQ 



:r 



(i-^r fh Q ,+ 



a-/ 



QQ,j' 



2N-j 

With f PP:1 = 1, fQQ,2N-l = 1- 

Armed with these equations, numerical calculation of probability of iden- 
tity now amounts to iterating matrix equations to find a fixed point. How- 
ever, there is an obstruction to studying this system of equations analytically. 
Although with strictly positive mutation rates the Wright-Fisher model 
must have a stationary distribution, it is not known explicitly and con- 
sequently, neither are the transition probabilities Tij. In our simulations of 
Section 7 these are calculated numerically. The numerical estimates show 
that the Wright-Fisher model is not reversible (that is T^ does not coincide 
with Pij). In the next section we write down a diffusion approximation for 
the Moran version of this model. The distinction between the Moran and 
Wright-Fisher models is that in the Moran model we have overlapping gen- 
erations. This has the advantage that the frequency of P-alleles will then 
be a generalized birth death process and consequently has a unique invari- 
ant measure and this invariant measure is reversible. The backward in time 
transition probabilities for the proportion of type P are then known: they 
are just given by the forward in time transition probabilities. From Ethier 
and Kurtz [(1986), Chapter 10, Section 2] we can check that the diffusion 
approximations for the Wright-Fisher model and that found here for the 
Moran model are the same. Moreover, since both models are exchangeable, 
the genealogies of a sample from the population predicted by the two models 
will coincide in the diffusion limit [see Kingman (1982a)]. 
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To identify the appropriate Moran version of our model, consider again a 
diploid population, but now evolving according to a continuous time Markov 
chain so that, in particular, generations overlap. A single step of the chain 
corresponds to the death of one (diploid) individual and its replacement by 
another. Such deaths occur at exponential rate N and each individual in 
the population is equally likely to die. The reproductive step follows the 
same sequence as before. Writing, as before, p* and q* for the proportions 
of the two types of gamete after the action of selection, and p** , q** for the 
corresponding proportions after both selection and mutation, we have Table 
1. 

For convenience, we write (1 + s)/(2 + s) = (1 + S)/2. Since in the Wright- 
Fisher model an individual chooses her parents at random, we see that the 
natural continuous time analogue of our Wright -Fisher model for the evo- 
lution of allele frequencies at the selected site is the following version of the 
Moran model. 

Definition 2.1 (The Moran model). The Moran model of a population 
of size 2N is a continuous time Markov chain. At exponential rate N, a pair 
of individuals is chosen at random from the population. One dies and the 
other reproduces. If the pair chosen consists of one type P and one type Q 
individual, then the probability that it is the P individual that reproduces 
is (1 + S)/2. A type P parent produces a type P offspring with probability 
1 — fjii, otherwise her offspring is type Q. Similarly, a type Q parent has type 
Q offspring with probability 1 — /j,2, otherwise her offspring is type P. 

Remark 2.2. (i) As before, we can take the parameter s, and conse- 
quently S, to be frequency dependent. 

(ii) We are assuming that the sampling at each birth/death event is with 
replacement. This simplifies the expressions for the transition probabilities 
of the process of allele frequencies at the selected locus, but the price that we 
pay is that it will somewhat complicate those for the transition probabilities 
for the sample as we trace backwards in time. Whether we sample with or 
without replacement will not change the diffusion limit. 





Table 1 






Type to die 


P* 




p** 


PP 
PQ 

QQ 


1 
(l + a)/(2 + «) 




(1- 


M2 
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In order to keep track of the type at the selected locus of individuals in 
our sample from the neutral locus, we must also incorporate the effects of 
recombination. As before, we suppose that at each birth/death event there 
is a probability r of a recombination event. The type at the selected locus 
of the offspring of such an event is then inherited not from her parent, but 
from the individual that died. By this mechanism, as well as by mutation 
at the selected site, we see migration between the two genetic backgrounds. 

Suppose then that we have a sample of individuals from our population 
and that the type at the selected locus of each individual in our sample is 
known. We write n\ (0) for the number of individuals in the sample in back- 
ground P and n 2 (0) for the number in background Q. We are concerned 
with the ancestry of the sample. (We superimpose the effects of mutation to 
a novel type at the neutral locus later.) Thus we write n\ (t) for the number 
of ancestors associated with type P at the selected locus and nj> (t) for the 
number of ancestors associated with type Q at time t before the present. 
We write p^ 1 ' (t) for the proportion of the whole population that are type P 
at that time. 

Our final task in this section is to write down the generator of the (back- 
ward in time) process {(p^(t),n\ (i),n 2 (t))}t>o- We suppose that p^'(0) 
is drawn from the stationary distribution of {p( l > (t)}t>o an d (n\ (0), n 2 (0)) 
is arbitrary. The generator will be a very cumbersome object. Mercifully, 
things will be greatly simplified when we pass to a diffusion approximation. 

As we remarked above, the stationary distribution for the Moran model 
is reversible, so the backwards in time dynamics of the allele frequency, 
{p (t)}t>0i are the same as the forwards in time ones described in Defini- 
tion 2.1. We are going to need the transition probabilities for this process. 



Lemma 2.3. Consider the probabilities in the jump chain of {p^ 1 ' (£)}t>o • 
Suppose that — looking backward in time — the proportion of the population 
of type P immediately before and after an arbitrary birth/death event are 
Pm andp m +i, respectively. Writing P Pt p = P[p m+ i= p\p m = p], we have 



(3) 



P p , p =p 2 (l- W ) + (l-p) 2 (l-/i 2 ) 

Jl+S 1-S 

+ 2p(l - p) I — ^— Hi + — g— /i 2 



/ 1 — 5 

(4) P p , P -i/{2N) = P ( JV*i + 2 ( l - P)— 2~ (1 ~ Ate) 

(5) P p!p+1/{m) = (l-p)((l-p) t i 2 +2p±±?-(l - Ml )). 
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We can now write down the generator of the triple {(p( l '(t),n\ (t), 
n 2 (*))}*>o- We write 



W p+=p + Jn 
(i-p-)> ?=(i-p)> ?+ = (i-p+)- 



P-=P-^. ^=^+2iV 



Lemma 2.4. The generator, Am, of the process {(p^(t),ri[' (t),n 2 (*))}t>o 
is given by 

A(i)f{p,ni,n 2 ) 

( ( ni ) ^ 

x (/(p_,ni - l,n 2 ) - f(p,n 1 ,n 2 )) 

f ni (2Nq-n 2 ) 
+ Nc -{ wVa W 

+ 2A^~ (1 + 5 " )(1 " ^ + 2A^~ 2A^ 2 } 
x (f(p-,m ~ 1,712 + 1) - f(p,n 1 ,n 2 )) 
+ {NP PtP -i/( 2N ) -R\ - R 2 )(f(p-,nx,n 2 ) - f(p,ni,n 2 )) 

( C 2 ) n n 1 

+ iV(l - r)cJ^-p + g+(l - S + )(l - (i 2 ) + -^- p+pfi A 

x (f{p+,ni,n 2 - 1) - f{p,n 1 ,n 2 )) 

+ Nc+ { 4N*pa P+P ^ 

+ w- q p+q+{l " 5+)(1 " ^ 2) + ik p+ ^} 

x (/(p+,m + l,n 2 - 1) - f(p,n 1 ,n 2 )) 
+ ( NP p, P +i/(2N) -Ra- Rb){f{p-¥,nx,n 2 ) - f(p,m,n 2 )) 

( ( ni ) 1 

+ Ar ( 1 - r ){^%^-( 1 -^) + ^^ 1 - 5 ^} 

x (f(p,n\ - l,n 2 ) - f(p,ni,n 2 )) 
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(- ( " 2 ) 
+ iV ( 1 " r )| 72A^7^+(1 - to) + ij^Pli 1 + S )Vl 

x (f(p,ni,n 2 - 1) - f(p,n 1 ,n 2 )) 

, Km ,ni(2Nq-n 2 ) , . 

+ ^ (1 " r) 4N*pq m{1 " 5)/ " 2 

x (f(p,n\ - l,ra 2 + 1) - f{p,n 1 ,n 2 )) 

n 2 {2Np-ni) 

+ N 0- - r ) — 4]y2^ — MO- + S)fii 



x (f(p,ni + l,n 2 - 1) - f{p,n 1 ,n 2 )), 



where 



c_ = 


p 
p 


c+ = 


p 

D 



S = S ^ = ^TT-V S. = S(p-), S + = S( P+ ), 
2 + s(p) 

p(p//i + (l-p)(l-5)(l-At 2 )) 
(l-p_)((l-p_)M2+P-(l + 5_)(l-^))' 

(l-p)((l-p)// 2 +p(l + 5)(l-//i)) 
- p+,p P+(P+/^i + (1 -P+)(l - 5 '+)( 1 - A«2)) 
and i?j denotes the rate in the ith term of the above expression. 

Proof. Conditional on the changes in p, we calculate the probabilities 
of the possible changes in {n\,n 2 ). A backward in time birth/death event 
corresponds to a forward in time one. To establish the genealogy of our 
sample, we need to know the role of individuals in the sample in this forward 
transition. Viewed backward in time the possible transitions of the sample 
are "migrations," in which the type of the parent of an individual in the 
sample differs from that of her offspring, and "coalescences," in which two 
individuals in the sample arise from the splitting of an individual in the 
previous generation. Although when we pass to the diffusion limit these 
processes will not happen in a single step (so that two individuals in the 
sample of different types will not coalesce), here we cannot exclude that 
possibility. 

First observe that exactly three individuals are involved in the (forward 
in time) birth/death event: the individual that died, the one that split (i.e., 
gave birth) and her offspring. We write (i,j,k) with i,j,k E {P, Q} for the 
event that the types of these three individuals are respectively i, j, k. Because 
we have assumed that we are sampling with replacement, the individual that 
died can coincide with the one that split. In this case, we write (i,j,k) as 
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(i,k). Let us write p m ,p m +i for the forwards in time process immediately 
before and after an arbitrary birth/death event. From Bayes' rule, 

P[0\ h ty \Pm+l = P,Pm=p} = P[(», j, k) \p m = P, Pm+1 = P] 

= W[(i,3, fc) n {pm+l = P} \Pm = P] 
~ p. 



n(i,j,k)\p m =P] 



p 



p.p 



X(id,k),pj» 



where X(i,j,k),p,p * s one ^ * ne even t (hJik) results in a (forward in time) 
change in the proportion of type P from p to p and zero otherwise. 

We now use this prescription to calculate all nonzero conditional prob- 
abilities of this form. Again we write p m ,Pm+i for the proportions in the 
jump chain backwards in time. First suppose that the gene frequency does 
not change: 



F[(P,P,P)\p m =p,p m+1 =p] 



[(P,P)\Pm=P,Pm+l=p] 



[{Q,Q, Q) \p m = p, p m+ i = p] 



[(Q,Q)\Pm=P,Pm+l=p\ 



F[(P,Q 1 P)\ Pm = p,p m+1 =p] 



F[(Q,P,Q)\ Pm =p,p m+1 =p] 



p(p-l/(2A0)(l-/ii) 

p 

r p,p 

(p/(2JV))(l- m ) 



P 



p.p 



(l-p)(l-p-l/(2JV))(l- M2 ) 

p 

r p,p 

(l-p)(l/(2JV))(l-/, 2 ) 

P ' 

r p,p 

p(l-p)(l-S)tl 2 

P ' 

r P,P 

P (l-p)(l + S)lMl 



P 



p.p 



When a type P individual is lost (looking backward in time) we have 



(Q,Q,P) 
(Q,P) 



Pm=P,Pm+l=P 



Pm=P,Pm+l =P 



(Q,P,P) 



Pm =P,Pm+l =P 



1 

m. 
1 

" 2iV 

1 " 

2iV 



;i-p + i/(2jv))(i-p)/x 2 

Pp-l/(2N),p 

(l-p+l/(2N))l/(2N)p 2 



P 



p-l/(2N),p 



(p-l/(2JV))(l-p + l/(2JV))(l + 5_)(l-Ati) 



P. 



p-l/(2N),p 
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Finally, if a type one individual is gained (looking backward in time) we 
have 



(P,P,Q) 


p m -- 


1 " 

= p, Pm+1 =p+ — 


F 


(P,Q) 


Pm 


1 " 

= p,p m+1 =p+ — 


(P,Q,Q) 


Pm 


1 1 

= p,p m+l =p+ — 



(p + l/(2N)) Pf i 1 



P, 



p+l/(2N),p 
(p + l/(2JV))(l/(2JV))// 1 

Pp+l/(2N),p 



_ (p + l/(2JV))(l-p-l/(2JV))(l-5 + )(l-// 2 ) j 

Pp+l/(2N),p 

All that remains is to establish the probability that these birth/death 
events involved individuals in the sample. 

First we consider coalescence. A coalescence of two individuals associated 
with type P occurs in the sample if the parent and offspring are both as- 
sociated with type P and both form part of the sample and there was no 
recombination. Thus if p m = p, conditional on (P,P,P) or (Q,P,P), this 

happens with probability (1 — r)( 2 1 )/( o )' Similarly for a coalescence of 
two individuals in the sample associated with type Q. For individuals as- 
sociated with type P and Q from the sample to coalesce requires an event 
of the form (i, Q, P) or (i, P, Q) (corresponding to parent and offspring hav- 
ing different type) and conditional on one of these events happening, has 
probability (1 - r) 2Np ffi± 2Np) ■ 

Now we consider "migration." An individual in the sample can "migrate" 
from one background to the other as a result of mutation or recombination. 
In either case she must be the offspring of a birth/death event. If the parent 
and the individual that die have different types, then a recombination com- 
bined with a mutation does not lead to a change in background. Thus condi- 
tional on p m = p and a birth/death event that involved mutation from type 
Q to P forward in time, an individual in our sample will migrate from type P 



to Q (backward in time) with probability (1 - 



[ n 1 (2N-2Np-n 2 ) ■ 



2Np{2N-2Np) 



if the individ- 



ual that dies and the individual that split are different types, 2Nv(2N-^,Nv) 
if they are the same type but different individuals and n\/(2Np) if the 
individual that dies is the parent. Similarly, conditional on p m = p and a 
birth/death event involving a mutation from P to Q, an individual in our 
sample migrates from type Q to P (backward in time) with probability 

0- ~ r ) 2 n Np 2 {2N~2Np) if the event is (Q>P,Q), with probability 2^^"^) 



if the event is (P, P, Q) and with probability 



n2 
2N-2Np 



if the event is (P,Q). 
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If there is no mutation, an individual in the sample can still change type 
due to recombination: conditional upon (Q,P,P) with probability ^fc, a 
member of the sample migrates from background P to Q and conditional 
on (P,Q,Q) with probability 2 n-2Np > an individual of the sample migrates 
from background Q to background P. 

Finally, recalling that events in the jump chain take place at an exponen- 
tial rate N, we obtain the claimed expression. □ 

3. The generator of the diffusion approximation. In this section we iden- 
tify the generator of the diffusion approximation corresponding to the back- 
ward in time model of Section 2. Existence of a corresponding stochastic 
process and convergence to the limit is deferred to the following sections, 
but our proofs will require the following assumption on the selection coeffi- 
cient. 

Assumption. The selection coefficient, s : [0, 1] — > M is a Lipschitz con- 
tinuous function. 

As usual, we speed up time by a factor of diploid population size, N, and 
correspondingly scale down the parameters in the model by the same factor. 
Thus ^ i— > Hi/N and r i— > r/N. The selection coefficient, s, is also scaled by 
N. Notice that 

» + '/"_i( 1+ .Wiy 



2 + s/N 2\ 2NJ \N, 

so that at the iVth stage of the rescaling S = s/(2N) + o(l/N). 

We write -A(jv) for the generator of Lemma 2.4 with parameters scaled in 
this way. 

Lemma 3.1. Let E = [0, 1] x {1, . . . ,m(0) + n 2 (0)} x {1, . . . ,m(0)+n 2 (0)} 
and suppose that f(p, m, n 2 ) : E — > R is twice continuously differ -entiable with 
respect to p. Then for < p < 1, 

A( N )f(p,ni,n 2 ) ->Af(p,ni,n 2 ) as N^oo, 

where 

Af(p,ni,n 2 ) 



(6) 



2~ ( ^ ) (f(Pi n i l,n 2 )-/(p,ni,n 2 )) 



(7) + — f ^ 2 j (/(p,ni,n 2 -l)-/(p,ni,n 2 )) 

(8) + -/J>i — (f{p,n! + l,n 2 - 1) - f(p,ni,n 2 )) 

q 2 
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(9) + -M2-7r(/(p,ni - l,n 2 + 1) - f(p,ni,n 2 )) 

p 2 

(10) +r—(/(p, ni + 1,712- 1) - f(p,n 1 ,n 2 )) 

(11) +r—(/(p, ni -1,712 + 1) - f(p,ni,n 2 )) 

(12) + (-Mip + ^2<7 + spq)-f'(p, m,n 2 ) + jpqf"{p, ni,n 2 ), 
and ""' denotes differentiation with respect top. 

Remark 3.2. Note that //i, the mutation rate for P — > Q forward in 
time, is involved in jumps Q — > P in this backward in time generator (anal- 
ogous for ^2). 

Proof of Lemma 3.1. Fix p>0 and consider the generator A^, 
which is the generator An\ of Lemma 2.4 with the parameters scaled as 
above. Notice that c+ and c_ tend to one as N — > 00 and that the effect of 
speeding up time is to multiply the whole generator by a further factor of 
N. 

Like An\ the generator Ar^\ consists of ten terms corresponding to all 
possible events. The first and seventh term sum to give 

Y v ("a 1 ) (f(p,ni - l,n 2 ) - /(P,ni,n 2 )) + o(i). 

The fourth and eighth terms sum to give 

2~ ( ""2 ) ^^' ni ' n2_1 ) _ ^^' ni ' n2 )) + ( 7v 

Ignoring the part arising from recombination, the fifth and tenth terms sum 
to give 

-Miy(/(P> re i + !> n 2 - 1) - f(p,n 1 ,n 2 ))+o(—j. 

Ignoring the part arising from recombination, the second and ninth terms 
sum to give 

-M2y(/(p,rai - l,n 2 + l) -f(p,ni,n 2 )) + o(— j. 

The contribution to the second and fifth terms from recombination sum to 
give 

r—(f(p,ni + l,n 2 - 1) - f(p,n 1 ,n 2 )) 

+ r^-(f(p,n l -l,n 2 + l)- f(p,n 1 ,n 2 )) + o( — J. 
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This leaves the third and sixth terms. First observe that NRi for i = 1, 2, 4, 5, 
are all 0(1), whereas N 2 P PtP _i^ 2 N) an d -^ 2 -Pp, P +i/(27V) wm both be 0(N 2 ). 
Using smoothness of / as a function of p, we expand f{p-,ni,n 2 ) and 
f(j) + ,ni,n 2 ) in a Taylor series about (p, nx,n 2 )- The third and sixth terms 
then become 

(N 2 P p ^ 1/(2N) -NR 1 -NR 2 )(-^-f'(p,n 1 ,n 2 ) + -^f"(p,n 1 ,n 2 : 



2N J *>•'»•"> ' 8N 2 - 
±f'( p , ni ,n 2 ) + t 



(N 2 P PtP+1/{2N ) ~ NR A - NR 5 )(^f'(p, ni ,n 2 ) + -L/"(p,m,n 2 ; 



where we have again used '"" to denote differentiation with respect to p. 
Substituting from Lemma 2.3 we obtain 

/ 1 1 

+^+K i +^)( i -f 

< I —f'(p,n 1 ,n 2 ) + — =/ // (p,ni,n 2 ) J +C( — 



which reduces to 

"PW + Y + /^2P<7 I 2 / (P> n l) n 2j 

+ f 9^2 + — - fiipq) 2^'(P' n i,n 2 ) + -£pqf"(p, ni,n 2 )+Oi — 

1 1 / 1 

= (- W + /^2<? + spq)-f'(p, m, n 2 ) + -pqf'ip, n u n 2 )+0[ — 

Letting N — > oo in the above expressions completes the proof. □ 

Remark 3.3. There are no surprises in the form of this generator. If we 
think of our population as subdivided into two demes according to the type 
at the selected site, then we should expect the genealogy of the sample to 
be given by a structured coalescent. Thus the terms (6) and (7) correspond 
to the coalescence of individuals in the same deme, which happens at a rate 
inversely proportional to the population size within that deme. The terms (8) 
and (9) reflect migration between demes as a result of mutation. The rates 
must be scaled by the ratio of the population sizes in the different demes, just 
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as in the structured coalescent. The terms (10) and (11) reflect migration 
due to recombination. Evidently these rates must be proportional to the 
proportion of the population that is of the opposite type. (Recombining with 
an individual of one's own type has no net effect.) Finally the term (12) is 
simply the generator of the diffusion approximation to our Moran model for 
allele frequencies. 

4. Existence of the diffusion approximation. It is not immediately obvi- 
ous that there should be a stochastic process with generator given by (6)- 
(12). The immediate problem is that the coalescence and migration rates for 
the sample become unbounded as the allele frequency p tends to zero or one. 
This means that, in principle, we could see an infinite number of jumps in 
finite time. However, what we shall see is that this does not happen because 
the process jumps away from the "bad region" in a finite number of jumps. 

First let us define "bad" (or rather "good" regions) for the process. Ev- 
idently we want to keep away from regions where p is small and m ^ or 
where q is small and n 2 ^ 0. We therefore define 



n (k) 



1 



u 



<{0}x{0,l,...,ni(0) + n 2 (0)} 

{0,l,...,ni(0)+n 2 (0)}x{0} 



H> 



U Q, 1 - jA x {0, 1, 2, . . . , m(0) + n 2 (0)}{0, 1,2,..., m(0) + n 2 (0)}. 

Notice that the sets U^ k > are open subsets of the state space E. 

The first, straightforward, task is to show that the process exists until its 
exit time from LA k > for each h. In an obvious notation, we write A n for the 
portion of the generator corresponding to the terms (6)-(ll) and A p for the 
portion corresponding to the generator of the allele frequencies, namely (12). 



Lemma 4.1. We write C 2 (E) for bounded functions f : E — ► R which are 
twice continuously differentiate with respect to p. Define 

A p k) f(p, ni,n 2 ) = Xuw{{p, n 1 ,n 2 ))A p f(p, m, n 2 ), 

An^fiP, ni,n 2 ) = Xuw((p, ni,n 2 ))A n f(p, m, n 2 ). 

Then the closure of 

{(f,A^f):fGC 2 (E)}^{(f,A p ^f + A^f):feC\E)} 

generates a Feller semigroup. 
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(k) 

Proof. This is standard. First consider A p applied to functions / € 
C 2 ([0,1]). Then from, for example, Ethier and Kurtz [(1986), Chapter 8, 
Theorem 2.8], the closure of {(f,A p f) : f G C 2 ([0, 1])} generates a Feller 
semigroup on the continuous functions on [0, 1] . (It is here that we have 
used that the selection coefficient, s, is a Lipschitz continuous function.) 

We also know that, for fixed p € [0, 1], An generates a Feller semigroup on 
continuous functions on {0, 1, . . . ,ni(0) + ^(O)} x {0, 1, . . . , ni(0) + n2(0)} 
(since the state space is finite and the jump rates are all bounded). Evi- 
dently both generators can be regarded as acting on E and they are Feller 
generators on continuous functions on E. 
Now observe that 

Il4 fe) /IL < K(0) 2 + n 2 (0) 2 + (2k + r)(m(0) + ^(O))))!/^ 

and so A^ k > is a bounded perturbation of A p and hence also generates 
a strongly continuous contraction semigroup [see Ethier and Kurtz (1986), 
Chapter 1, Section 7]. That the resulting semigroup is positive and conser- 
vative is an easy consequence of the Trotter product formula and the proof 
is complete. □ 

We can think of the processes constructed in Lemma 4.1 as solutions to 
a stopped martingale problem. Let r k = inf{£ > : (p(t),ni(t),n 2 (t)) £ U^ k '}. 
Then, for / in the domain of A, 

f(p(t), ni (t),n 2 (t))- [ tATk A^f(p(s), ni ( S ),n 2 (s))ds 



(13) „ t AT k 

= f(p(t),n 1 (t),n 2 (t))- / Af(p(s),n 1 (s),n 2 (s))ds 

Jo 

is a martingale. We remark that this stopped martingale problem is well 
posed since it is associated with a Feller generator [see Ethier and Kurtz 
(1986), Chapter 4, Theorem 4.1]. 

To establish existence of the process corresponding to the generator A on 
the whole of E, we shall use the following result [Ethier and Kurtz (1986), 
Chapter 4, Theorem 6.3]. 

Theorem 4.2. Let (E,d) be a complete and separable metric space and 
let A C C(E) x B(E). LetUiCU 2 C--- be open subsets ofE. Fix v G V{E) 
and suppose that for each k there exists a unique solution X^ of the stopped 
martingale problem for (A,u,Uk) with sample paths in De[0,oo). Setting 

T k = inf{t : X k (t) i U k or X k (t-) £ U k }, 

suppose that for each t > 0, 

(14) lim P{r fc < t} = 0. 

k— »oo 
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Then there exists a unique solution of the De[0,oo) martingale problem for 
(A,u). 

Here C(E) denotes bounded continuous functions on E, B{E) denotes 
bounded Borel measurable functions on E, De[0, oo) is cadlag paths in E 
and V{E) is probability measures on E. The generator A is to be thought 
of as {(/, Af)} for / in a suitable class. [In our case A should be thought of 
as the closure of {(f,Af) : f E C 2 (E)}.] The probability measure v specifies 
the initial distribution of our process. 

Our task then is to show that if we take Uk = U^ ' , then the condition (14) 
is satisfied. 

Proposition 4.3. With r^ as above, for any fixed t > 0, 

lim F{r k < £} = 0. 

k— >oo 

Of course, if the boundaries are inaccessible for the process {p(t)}t>o, 
which, as we show as part of Lemma 4.4, is true provided fj,i > 1/2 for 
i = 1,2, then there is no problem. The difficulty is to check that this is 
still true under the much weaker (and biologically more realistic) condition 
that we are assuming here, that \i{ > for i = 1, 2. The key is the following 
lemma. 

Lemma 4.4. (i) Suppose that fi 2 > 1/2 (resp. \i\ > 1/2). Then (resp. 
1) is an inaccessible boundary for the process {p(t)}t>o- If H>2 £ (0, 1/2) [resp. 
fXi € (0,1/2)], then it is accessible. 

(ii) Suppose that /i2 < 1/2 (resp. \X\ < 1/2). Then for any fixed value of 
p(0) G (0, 1) and any K > 0, writing T x (a) for the first hitting time of a by 
the process {p(t)}t>o given that p(0) =x, we have 

fpfojCVk) l 

■ds> K 



(15) lim ] 

k— *OD 

respectively, 



lim I 

k— >oo 



o p(s) 

t- P (o)(i-iA) i 



1, 



l-p(s) 



ds>K 



1. 



Remark 4.5. Equation (15) is not the strongest statement that we 
could make about the divergence of the integral, but it is the form that 
we require in the proof of Proposition 4.3. 

Proof of Lemma 4.4. Recall that for a one-dimensional diffusion pro- 
cess on the interval [0, 1] with generator 

1 / s d 2 , , , d 
L= 2 a(x) ^ + 6(X) ^' 
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the scale, n(x), and speed, m(x), are defined for x € [0, 1] by 



n{x) = j\* V (-j y c 2 -^dz)dy, 



m(x) = -— exp( / — — -dz)dy, 
J c a{y) \Jc a(z) J 

where c G (0, 1) is fixed arbitrarily. According to Feller's boundary classifi- 
cation, a boundary point e is accessible or inaccessible according as 

u{e) = I m{x)dn{x) 



■Jc 

is finite or infinite. 

For the process of allele frequencies, {p(t)}t>o, we have 

a(x) = \x(l — x), b(x) = tj(s(x)x(1 — x) — fj,\x + ^{l — x)). 

Substituting gives 

n(x)= f X exp(- ^ 2s(z) dz)y~ 2f12 (1 - y)' 2 ^ 1 dy, 

m(x) = £ -i v exp (^£ 2s(z) dz^j y 2 ^ (1 - y) 2 ^ dy. 

Since s(x) is bounded and continuous and /Xj > for i = 1, 2, we have n(x) ~ 
J x , y~ 2 ^' 2 dy as x J, which is bounded or unbounded according as \x<i < 1/2 
or H2 > 1/2. The symmetrical argument applied to the boundary point 1 
completes the proof of Part 1 of the lemma. 

Now suppose that \i2 < 1/2. Since p(0) is arbitrary, for the remainder of 
this proof we shall suppress it in our notation and write r(x) for the first 
hitting time of x. 

First we convert the process p(t) to natural scale. That is, we study the 
process Y(t) =n(p(t)). Now 

Y(t)=Y(0)+ f a(Y(s))dW s , 
Jo 

where {Wj}t>o is a standard Brownian motion started from n(p(0)) and 
(using ""' to denote differentiation) 



a(y) = n'(n- 1 (y)) v /ln-i(y)(l-n-i(y)). 

Without loss of generality, since scale is defined only up to translation, we 
may assume that n(0) = and then as x I our calculations above show 
that n(x) ~ x 1-2 ^ 2 . The quantity that we are interested in is 

rr(l/k) I 

Jo n~\Y(s)) dS - 
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Writing Y(t) = W(j(t)) where 7 is defined by 

n(t) 1 



, v ds = t, 
/o cr{W s ) 

and substituting r = 7(5) in the integral, we obtain 

,r(i/fc) 1 rr(r(i/fc)) 1 1 

I p(s) dS = Jo n-\W r ) a{W r ) dr - 

Now observe that 

n-\ X ) ~ X 1 /^- 2 ^), <r(x) ~ ^2/(1-2/^1/(2(1-2^)) ag ^ | 0> 

and so the behavior of (15) is determined by 

, 7 (r(l/fc)) l 
( 16 ) / TFT- dr , 

v ; Jo W r a 

where 



1 2 M2 1 1 , 1 

a= h -7z - — r = 1 + 



1-2^2 1-2//2 2(l-2// 2 ) 2(l-2/i 2 ) 

Notice that W(7(r(l/ib)) = n{l/k) implies that 7(r(l/jfe)) ~ T W (k~^~ 2 ^), 
where t (x) is the first hitting time of x by the Brownian motion {Wj}t>o- 
Finally, since /i 2 < \, a > 1 and limfe_ < . 0O 7(r(l/A;)) = r M/ (0), we deduce (15). 

□ 

Proof of Proposition 4.3. For a G [0, 1], we retain the notation r x (a) 
for the first hitting time of a by the process {p(t)}t>o given that p(0) = x. 

Suppose that is an inaccessible boundary for the process {p(t)}t>o- 
Then we have P [77 /£,(()) < i] — > as k — > 00. A fortiori, the probability that 
{(p(i),ni(i),n 2 (i))}t>o hits {^} x {i} x {?ii} for i ^ before time t tends 
to zero as k tends to infinity. Similarly, if 1 is inaccessible for the process of 
allele frequencies, the probability of hitting {1 — -r} x {n\\ x {1} for nonzero 
i tends to zero as k — > 00. 

We concentrate on the case when is an accessible boundary for the 
process {p(t)}t>o, that is P2 < 1/2. 

The idea of the proof is simple. If p approaches zero, then (15) implies 
that the probability of jumping from type P to type Q for individuals in our 
sample tends to one. On the other hand the jump rate from type Q to type 
P tends to zero. Therefore, with very high probability (tending to one), if 
the allele frequency is less than 1/k, we do not see a type P individual in 
(the ancestors of) our sample. Combined, if 1 is also accessible, with the 
symmetric argument as the allele frequency increases to one, we see that the 
probability of hitting the boundary of U^ k ' in finite time converges to zero 
as k — > 00. 
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We now make this argument more precise. Suppose that t > is fixed. 
We use the abbreviation n(0) =rti(0) +712(0). This is an upper bound on 
the number of individuals in the sample at all times. Now fix 5 > 0. For each 
p € (0, 1), define 

Al(p) = n(0) U(T^) + T 

and 

I- PHI 

A2CP) = =-• 

p 2 

Notice that A2 provides a lower bound for the rate at which individuals in 
the sample jump away from state P (provided that n\ 7^ 0), whereas Ai 
provides an upper bound for the rate at which they arrive. 

Let T e be an exponentially distributed random variable with rate Ai(e). 

Recall that for < a < x < b < 1 , 

n(b) — n(x) 



F[ Tx (a)<T x (b)] 



nib) — n(a) ' 



where the scale function, n(x) was defined in Lemma 4.4. Substituting, we 
see that by choosing N large enough, we can arrange that 

(17) P[t £ (0)<t £ (Ns)]>1-- uniformly in e <— . 

8 TV 

By choosing e to be still smaller if necessary we arrange that 

(18) F[T N£ >t]>l-^. 

Now let X be a Poisson random variable with mean K. Choose K large 
enough that 

(19) P[X>n(0)]>l--. 

8 

Finally suppose that p(0) > e and (with e fixed) using Lemma 4.4 choose ko 
large enough that for k > ko, 

T p(0)( l / k ) 



(20) 



X 2 (p(s))ds>K 

p(0)( £ ) 



>i-i. 



Now consider the first time T p ^(l/k) that the process of allele frequencies 
hits 1/k. We want to estimate the probability that ni(r p (o)(l/&)) = 0. We 
combine the above estimates as follows. We use equation (17) to restrict our 
attention to the event that between first hitting e and first hitting 1/k we 
always have p < Ne. Equation (18) then allows us to ignore the possibility 
that there are any jumps of individuals in the sample into state P in this 
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time interval. Equation (20) ensures that the rate of jumps out of state 
P is at least K (provided there are any individuals to jump) and finally 
equation (19) ensures that all individuals do indeed jump out of state P 
before the process hits 1/k. Thus, with probability at least 1 — 6/2 when the 
p process hits 1/k, n\ = 0. 

Started from p = 1/k and n\ = 0, we now let the process run until the 
first time T that n\ ^ 0. Evidently this is smallest if p = 1/k, n-i = n(0). 
Moreover, since the rate at which individuals jump into state P increases as 
p increases, provided that e < 1/2, this time is stochastically greater than 
T , the first time started from (l/2,0,ra(0)) that n\ ^ 0. The distribution of 
T is independent of e and 5. We want to apply the above argument once 
again to see that the next time the allele frequency hits 1/k, the probability 
that n\ = is at least 1 — 5/2. The only twist is that we may need to choose 
e smaller still to ensure that the probability that at the time T the process 
p is greater than e is at least 1 — 5/2. Notice that this last estimate can 
be obtained uniformly in k > 1/e by choosing e so that if we start from 
(0,0, ra(0)), then p{T) > e with probability 1 - 5/2. 

Now we are essentially done. The process hits the boundary p = 1/k with 
rii^O only after a geometric number of hits of p on 1/k. The success proba- 
bility for this geometric random variable is at most 5. Each "failure" accrues 
an additional waiting time bounded below by an independent copy of T . 
Since 5 was arbitrary, the proof is complete. □ 

5. Convergence. Having established existence of our candidate diffusion 
approximation, we now turn to proving that the rescaled processes of Section 
3 actually converge to this limit. That is, we prove the following theorem. 

Theorem 5.1. The processes {(p( N >(t),n\ (£),nr> (£))}t>o correspond- 
ing to the generators Ain) of Lemma 3.1 converge weakly in De[0,oo) as 
N — > co to the process {(p{t),ni(t),ri2(t))}t>o generated by A. 

The main tool in the proof will be the following result which is a special 
case of Ethier and Kurtz [(1986), Chapter 4, Corollary 8.7]. 

Theorem 5.2. Suppose that (E,d) is a complete separable metric space. 
Let A be a Feller generator on E corresponding to the Markov process X . 
For each N > 1, let X* ' be progressively measurable E-valued processes 
with full generators A^ N > and such that X^ N '(0) converges weakly to X(0) 
as N — > oo. Suppose that T>(A) separates points. Suppose further that the 
compact containment condition holds for {1W}jv>i. That is, for every e > 
and every T > there exists a compact set T Et T C E for which 

inf ¥[X w (t) G r £jT for < t < T] > 1 - e. 
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Suppose that for each (f,g) € A and T > there exist (/W, jW) € i^ 
and G^ C £ stic/i tfjcrf 

(21) lim P[XW (i) € G ^ , < t < T] = 1, 

JV— »oo 

sup^y II /* lloo < oo and 

(22) lim sup |/(x)-/W(z)| = 0= lim sup | 5 (x) -^(x)|. 

Then X^ N ' converges weakly to X as N — > oo. 

In fact, this result would be sufficient to allow us to prove the result in one 
fell swoop, but in view of the work of Section 4, it is convenient to proceed 
in two stages. First, taking the sets G^ N ' = E, we prove convergence of the 
stopped processes, -XV^fc), generated by -A(Ar,fc) = Xu( k )^(N)i to the process 
X^ for each k. The following lemma, which is a straightforward adaptation 
of Lemma 11.1.1 of Stroock and Varadhan (1979) then completes the proof. 



Lemma 5.3. Let {P^ n '}n>i be a sequence of probability measures on the 
space De[0, oo) and suppose that T^-> is a nondecreasing sequence of stop- 
ping times (with respect to the natural filtration) increasing to infinity almost 
surely. For each k>l, let {^ N,k '}N>i be a relatively compact sequence of 
probability measures such that ¥( N > k > is equal to ¥( N > on T T (k) ■ 

If the probability measure P has the property that, for any k>l, any limit 
point of {P^ N,k '}N>i agrees with P on J- T ( k ), then ¥^ N > converges to P as 

Proof of Theorem 5.1. First we fix k > 1 and consider the sequence 
of stopped processes {Xwy }jv>i ■ The compact containment condition of 
Theorem 5.2 is automatically satisfied since E is compact. We take the sets 
G n = E in condition (21). From Section 3 we see that we can take /( ' = / 
in condition (22) and convergence of the stopped processes is proved. 

Combining Proposition 4.3 with Lemma 5.3, the proof is complete. □ 

6. Differential equations for the identities. We now return to the prob- 
lem of calculating the probability of identity in allelic state at the neutral 
locus for a sample whose types at the selected locus are known. For simplic- 
ity, we consider the case of a sample of size two, but see Remark 6.2. 

Our approach is to use the diffusion approximation to write down a cou- 
pled system of ordinary differential equations for the probability of identity, 
indexed by the state at the selected locus of the sample, that is PP, PQ 
and QQ. Thus fpp will denote the probability of identity given that the 
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two individuals in our sample are both of type P. These quantities will be 
compared to the predictions of equations (1) and (2) in Section 7. 

If the current allele frequency at the selected locus is p (assumed as al- 
ways to have reached stationarity), then write Fpp(t,p) for the probability 
that at time t the (backwards in time) process {(p(i) , ni(i) , n2(t))}t>o is 
in [0, 1] x {0} x {1} U [0, 1] x {1} x {0} given that at time zero m(0) = 2, 
n2(0) =0. Similarly, define FpQ(t,p) and FQQ(t,p). We assume, as always, 
that {p(0)}t>o is drawn from the (reversible) stationary distribution for the 
process {p(t)} t >o- 

Since by our work of Section 4 there will be no point of accumulation of 
epochs of jump times for the process, following Feller [(1966), Section X.3], 
we see [by first conditioning on {p(t)} t >o] that {Fpp(t,p),Fp Q (t,p),FQQ(t,p)} 
can be characterized as the unique minimal solution to the following system 
of differential equations (we use F to denote the derivative of F with respect 
to t): 

Fpp= l -^ + ^f + r q yF PQ -Fpp) 

1 1 

+ -ji-^iP + V2Q + spq)F' PP + -pqFpp, 

FpQ = ^ (~ + r p) ( F pp ~ F PQ) + \ (— + r l) ( F QQ ~ F PQ) 

1 1 

+ ^(-MlP + A*29 + spq)F' PQ + -pqF'p Q , 



(23) 



QQ 



2 v r^ r-z z-*, r H 4 

F 2Q + (w + \ {FpQ _ Fi 



2q V q 

+ o i~^P + ^q + spq)F' QQ + -pqFQ Q . 



n\ r~±f r~*i rii yy ,1 

Suppose that the mutation rate to a novel allele at the neutral site is v, 
corresponding to the rescaling v \— > v/N of the model of Section 2, then con- 
ditional on the individuals having coalesced at time t, the probability that 
they are identical in state, that is, that there has been no mutation since 
time t along either of their lines of descent, is e~ 2ut . Conditioning on the 
time to coalescence then gives 

-ivtdFppitip) 



fpp(p) = J Q e dt dt 

with similar expressions for Jpq(p) and /qq(p)- Integration by parts then 
shows that, under the diffusion approximation, the probabilities of identity 
satisfy 

= -2vfp P + '-^ + (ff + rq) (f PQ - f PP ) 
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+ 2 (-VlP + ^29 + spq)f' PP + -£P<lfpp, 
= -2^/pg + ^ (— + rp) (f P P - f PQ ) 

(24) +^(^+^)(/qq-/pq) 

+ « (~MiP + ^29 + spq)f'p Q + -jPqfpQ, 

= -M Q g + ^^ + (^ + rp) (/Pg - /gg) 

1 i 

+ 2 ( _ MiP + V2Q + spq)f' QQ + -gpqfQQ- 

We now identify the probabilities of identity in state as the minimal solu- 
tion to this system. Again following Feller [(1969), Section X.3], {Fpp(t,p), 
FpQ(t,p), Fqq(1;,p)}, the minimal solution to (23) is most easily constructed 
via an iterative procedure. At the nth stage of the iteration, the functions 
{Fpp(t,p), FpMtjp), Fgg(t,p)} are obtained by conditioning the number 
of jumps that the process can make by time t to be at most n. Since (by 
our work of Section 4) the total number of jumps that the process can 
make by time t is finite, as n — ► oo this sequence of functions really does 
converge to the distribution function of the coalescence times. If we now 
define 

,(»)_ r -2ut dF^(t,p) 

fpp ~J e It dt ' 

with parallel definitions for fpX and fnn-, then as n — ► oo, {fpp(p), fpnip), 

fnoip)} converges to the minimal solution to the system of equations (24). 
Combining the above yields the following. 

Theorem 6.1. Under the diffusion approximation, if the process of al- 
lele frequencies is assumed to have reached stationarity, then the probabilities 
of identity in state for a sample of size two whose types at the selected site are 
known, denoted {J 'pp(p) , J pq{p) , J qq(p)} , are given by the minimal solution 
to the system of equations (24) . 

Before exploring the equations numerically, we make precise the iteration 
that we used above to construct the minimal solution. The sequence of func- 
tions {/pp,/pg,/gg}n>0 is obtained as follows. First set (fp P {p), /pg(p), 
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/^(p)) = (0,0,0) for p € [0, 1]. Then for n > 1, 

= -2,/g + i^S + (M + „) (/ <-> _ ;<»)) 

, X C i i \ d f(«) , X d<2 tin) 

+ gC-^iP + V2Q + spq)-j-fy + -pq—fy, 



1 



tex ffl v f M)_f(^ 



° = -*$ + ^ + (M + „) (/ ^> _ 4"), 

As n — > oo , the functions f P p , /pg and f^k converge (monotonically) to 
the minimal solution of (24). 

Since the system (24), arose by integrating the Kolmogorov backward 
equations (23), the boundary conditions are implicitly prescribed. However, 
in order to solve the equations numerically, we require explicit expressions. 
The first thing that we must check is that 

-.An) j f (n) -,2 An) J2. f ( n ) 

dfpp gjpQ drfpp d Jpq 

dp dp dp 2 dp 2 

all tend to as p tends to and similarly, 

tito #00 d2 fpo <*7oo 

all tend to as p tends to 1. The method is lengthy, but completely standard. 
For each equation, first use the Frobenius method of solution in series to find 
two linearly independent solutions to the corresponding homogeneous equa- 
tion. (One solution will be singular at p = and the other will not.) Then 
use the method of variation of parameters to write down the corresponding 
Green's function and finally integrate to obtain the solution to the origi- 
nal (inhomogeneous) equation. We omit the details. They can be found in 
any standard text on ordinary differential equations, for example, Simmons 
(1974). 
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Granted the above, we can now read off the boundary conditions from 
the equations. Letting p tend to in the first equation and to 1 in the last 
equation of (24) gives 



f M 1 + Wfe-^O) f (n) f1 l + Wfe-^l) 

fpp[0) ~ TT2^ ' f w {1) ~ TT2^ ' 



Letting p tend to 1 in the first equation and to in the last equation of (24) 
we see that we must have 

j An) fjA n ) 

(l + 4,)/S(l) + W fe(l) = l, (l + 4,)/(>)-^ 2 ^(0) = l. 

The second equation yields 

/$(0) = /&- 1} (0), /^(1) = /Sq 1) (1)- 



Remark 6.2. Our system of differential equations is for a sample of size 
two from a population that can be in just two possible states. Clearly this is 
a very special situation. Of course it is readily extended to larger systems. 
However, to characterize the transition probabilities for a sample of size 
TV" from a population with m possible states requires X)n=i X)Jj£=i ( T ) ( i _ i ) 
equations. The distribution of the time to the most recent common ancestor 
of the sample requires J2n=2Y^k=i( T k )(fe- i ) equations. The probability of 
identity in state for a sample of size two from a population distributed 
amongst m genetic backgrounds requires m(m + 3)/2 equations. Evidently 
for large samples or complex genetic backgrounds the approach will become 
intractable. 



7. Numerical examples. In this section we illustrate the accuracy of (24) 
when compared to a direct solution of the matrix equations (1) and (2) 
that gave us the exact probabilities of identity for the Wright-Fisher model. 
We then use the equations to illustrate the potentially important influence 
of the fluctuations on the probabilities of identity in allelic state at the 
neutral locus. We concentrate exclusively on the case of balancing selection 
s = sq(po — p) with pq = 1/2. We also set the recombination rate r = 0. 

These results could all be obtained for low selection rates using the ances- 
tral selection graph methods in which lineages branch into three potential 
ancestors at rate so- Here we used Mathematica to solve the differential 
equations numerically (a process that takes only seconds of computer time 
irrespective of the strength of selection). In all the figures we have included a 
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Fig. 1. Comparison of solutions to the matrix equations for identity in state obtained 
from the Wright-Fisher model to those of the differential equations obtained from the 
diffusion approximation. The three thin lines are (in descending order atp — 0) fpp, fgg 
and fpQ. The dotted lines are the solutions to the matrix equations and the bold line is 
the stationary distribution of p. The parameter values are N = 50, so = 0.16, po = 0.5, 
/Ui = 0.0005 = H2, ^ = 0.002. 

plot of the stationary distribution for the allele frequencies for the parameter 
values used. The unique stationary distribution for the process has density 

m(p)=/3^ 2 - 1 (l-p) 2 ^- 1 exp(-i So (p 2 + (l-p) 2 )), pe(0,l), 

where the constant j3 is chosen so that f m(p) dp = 1. Note that when the 
selection is very strong, rounding errors mean that the explosion of the 
density at the margins is not visible on the plot. 

Figure 1 compares the solution to the matrix equations (1) and (2) to 
the diffusion approximation obtained by solving the system (24). Even for 
the modest population size (fifty diploid individuals), the accuracy of the 
diffusion approximation is striking. We obtained similar results with other 




Fig. 2. From top to bottom at p = the plotted functions correspond to fpp, fcjQ, f 
and fpg. The thick line is the stationary distribution. The parameter values are v — 0.1, 
AH = 0.025 = Ate, po = 0.5, so = 0.16. 
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o i 

Fig. 3. The same as Figure 2 except that now sq = 0.32. 



parameter values. In this example the mutation rate between selected loci 
was chosen to be very small as this is the case when one expects the diffusion 
approximation to be most likely to break down. 

In Figures 2 and 3 the probabilities of identity are plotted for the case 
of strong balancing selection. In both cases the mean value of the allele 
frequencies at the selected locus when the population has reached station- 
arity is 1/2. The probability / =p 2 fpp + 2pqfpQ + q 2 fQQ of identity for 
two individuals selected at random is also plotted. This function should be 
compared to the constant value 0.43 that one obtains by setting p = ^ and 
using the standard structured coalescent model. Even when the strength of 
selection is rather strong, this is a poor approximation. In Figure 4 we plot 
the result of integrating the function / against the stationary distribution 
of the allele frequencies for different strengths of selection (all other param- 
eters being as in Figures 2 and 3). As we see the strength of selection has to 
be very strong indeed before the value predicted by the standard structured 
coalescent can be regarded as a good approximation. Finally, in Figure 5 we 
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Fig. 4. Prediction of the diffusion approximation for the integral of f against the station- 
ary distribution for the allele frequencies at the selected locus as a function of the strength 
of selection. All other parameters are chosen as in Figures 2 and 3. 
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Fig. 5. Predictions of the diffusion approximation for mean coalescence time scaled rel- 
ative to the neutral expectation (2N) for a sample of size two. The parameters are as in 
Figure 4. The thick line on the upper right is the prediction from the structured coalescent 

(6N). 

compare the mean time to coalescence for a sample of size two as a function 
of the strength of selection. 

We refer to the companion paper, Barton and Etheridge (2004), for a 
more detailed investigation and discussion of the biological issues raised. 
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